Flipkart is one of India's leading e-commerce platforms, offering a wide range of products across categories such as electronics, fashion, home essentials, and more. Founded in 2007, Flipkart has revolutionized the online shopping experience in India, providing millions of customers with convenient access to a diverse selection of products and services.
The Flipkart Sales Analysis Project aims to delve deep into the vast trove of sales data generated by Flipkart, with the goal of uncovering actionable insights that can inform strategic decision-making and drive business growth. By analyzing customer demographics, purchasing behavior, and sales performance across different product categories and regions, this project seeks to provide valuable intelligence to Flipkart's management and marketing teams.
The primary objective of this project is to understand customer preferences, market trends, and regional variations in sales performance on the Flipkart platform. By leveraging data analytics techniques, we aim to identify key factors influencing purchasing decisions and explore opportunities for enhancing customer engagement and optimizing sales strategies.
Gender Distribution: Analysis of gender distribution among customers reveals insights into consumer demographics and purchasing power, enabling targeted marketing efforts.
Age Demographics: Examination of sales performance across different age groups highlights segments with significant purchasing influence, guiding product development and marketing strategies.
Product Preferences: Understanding product preferences among male and female consumers helps in optimizing product assortments and tailoring promotional campaigns to specific audience segments.
Regional Variations: Variations in sales performance across different states provide insights into regional market dynamics and opportunities for localized marketing initiatives.
In this project, we leverage Python's data analysis and visualization libraries to clean, preprocess, and analyze Flipkart sales data. By conducting exploratory data analysis and visualizing key metrics, we aim to extract actionable insights that can drive business growth and enhance customer satisfaction on the Flipkart platform. Through meticulous analysis of customer demographics, product preferences, and regional sales trends, this project serves as a valuable resource for Flipkart's strategic planning and decision-making processes.
# Importing necessary libraries for data analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Importing the dataset with 'unicode_escape' encoding
flip_kart_df = pd.read_csv(r"C:\Users\jki\Downloads\flipkart_sales_data.csv", encoding='unicode_escape')
flip_kart_df.head(5)
User_ID | Cust_name | Product_ID | Gender | Age Group | Age | Marital_Status | State | Zone | Occupation | Product_Category | Orders | Amount | Status | unnamed1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1002903 | Kanishk | P00125942 | F | 26-35 | 28 | 0 | Maharashtra | Western | Healthcare | Auto | 1 | 23952.0 | NaN | NaN |
1 | 1000732 | Aryan | P00110942 | F | 26-35 | 35 | 1 | Andhra Pradesh | Southern | Govt | Auto | 3 | 23934.0 | NaN | NaN |
2 | 1001990 | Raunak | P00118542 | F | 26-35 | 35 | 1 | Uttar Pradesh | Central | Automobile | Auto | 3 | 23924.0 | NaN | NaN |
3 | 1001425 | Suwarna | P00237842 | M | 0-17 | 16 | 0 | Karnataka | Southern | Construction | Auto | 2 | 23912.0 | NaN | NaN |
4 | 1000588 | Pritam | P00057942 | M | 26-35 | 28 | 1 | Gujarat | Western | Food Processing | Auto | 2 | 23877.0 | NaN | NaN |
flip_kart_df.shape
(11251, 15)
flip_kart_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 11251 entries, 0 to 11250 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 User_ID 11251 non-null int64 1 Cust_name 11251 non-null object 2 Product_ID 11251 non-null object 3 Gender 11251 non-null object 4 Age Group 11251 non-null object 5 Age 11251 non-null int64 6 Marital_Status 11251 non-null int64 7 State 11251 non-null object 8 Zone 11251 non-null object 9 Occupation 11251 non-null object 10 Product_Category 11251 non-null object 11 Orders 11251 non-null int64 12 Amount 11239 non-null float64 13 Status 0 non-null float64 14 unnamed1 0 non-null float64 dtypes: float64(3), int64(4), object(8) memory usage: 1.3+ MB
In this section, we perform data cleaning by removing any columns that are unrelated or contain blank values. This step helps streamline the dataset and ensures that we are working with relevant information for analysis.
# Dropping columns 'Status' and 'unnamed1' from the DataFrame
# 'Status' column might not be relevant for the analysis, while 'unnamed1' seems to be an unnamed or redundant column
flip_kart_df.drop(['Status', 'unnamed1'], axis=1, inplace=True)
# Checking for null values in the DataFrame and summing them up
pd.isnull(flip_kart_df).sum()
User_ID 0 Cust_name 0 Product_ID 0 Gender 0 Age Group 0 Age 0 Marital_Status 0 State 0 Zone 0 Occupation 0 Product_Category 0 Orders 0 Amount 12 dtype: int64
# Dropping rows with null values from the DataFrame
flip_kart_df.dropna(inplace=True)
# Changing the data type of the 'Amount' column to integer
flip_kart_df['Amount'] = flip_kart_df['Amount'].astype('int')
flip_kart_df['Amount'].dtypes
dtype('int32')
flip_kart_df.columns
Index(['User_ID', 'Cust_name', 'Product_ID', 'Gender', 'Age Group', 'Age', 'Marital_Status', 'State', 'Zone', 'Occupation', 'Product_Category', 'Orders', 'Amount'], dtype='object')
# Renaming the column 'Cust_name' to 'Customer_name'
flip_kart_df.rename(columns={'Cust_name': 'Customer_name'}, inplace=True)
# Checking statistical summary of the DataFrame
flip_kart_df.describe()
User_ID | Age | Marital_Status | Orders | Amount | |
---|---|---|---|---|---|
count | 1.123900e+04 | 11239.000000 | 11239.000000 | 11239.000000 | 11239.000000 |
mean | 1.003004e+06 | 35.410357 | 0.420055 | 2.489634 | 9453.610553 |
std | 1.716039e+03 | 12.753866 | 0.493589 | 1.114967 | 5222.355168 |
min | 1.000001e+06 | 12.000000 | 0.000000 | 1.000000 | 188.000000 |
25% | 1.001492e+06 | 27.000000 | 0.000000 | 2.000000 | 5443.000000 |
50% | 1.003064e+06 | 33.000000 | 0.000000 | 2.000000 | 8109.000000 |
75% | 1.004426e+06 | 43.000000 | 1.000000 | 3.000000 | 12675.000000 |
max | 1.006040e+06 | 92.000000 | 1.000000 | 4.000000 | 23952.000000 |
# Generating statistical summary for specific columns: 'Age', 'Orders', and 'Amount'
flip_kart_df[['Age', 'Orders', 'Amount']].describe()
Age | Orders | Amount | |
---|---|---|---|
count | 11239.000000 | 11239.000000 | 11239.000000 |
mean | 35.410357 | 2.489634 | 9453.610553 |
std | 12.753866 | 1.114967 | 5222.355168 |
min | 12.000000 | 1.000000 | 188.000000 |
25% | 27.000000 | 2.000000 | 5443.000000 |
50% | 33.000000 | 2.000000 | 8109.000000 |
75% | 43.000000 | 3.000000 | 12675.000000 |
max | 92.000000 | 4.000000 | 23952.000000 |
flip_kart_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 11239 entries, 0 to 11250 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 User_ID 11239 non-null int64 1 Customer_name 11239 non-null object 2 Product_ID 11239 non-null object 3 Gender 11239 non-null object 4 Age Group 11239 non-null object 5 Age 11239 non-null int64 6 Marital_Status 11239 non-null int64 7 State 11239 non-null object 8 Zone 11239 non-null object 9 Occupation 11239 non-null object 10 Product_Category 11239 non-null object 11 Orders 11239 non-null int64 12 Amount 11239 non-null int32 dtypes: int32(1), int64(4), object(8) memory usage: 1.2+ MB
In this section, we visualize the distribution of gender within the dataset using a bar chart to understand the count of each gender category.
# Setting up the figure size and background color
plt.figure(figsize=(10, 6), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Gender Count')
# Creating a count plot for gender using seaborn
a = sns.countplot(y='Gender', hue='Gender', data=flip_kart_df, palette='Blues')
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Adding labels to the bars
for bars in a.containers:
a.bar_label(bars)
In this section, we visualize the distribution of total sales by gender using a pie chart to understand the contribution of each gender category to the overall sales.
# Calculating total sales by gender
sales_gender = flip_kart_df.groupby(['Gender'], as_index=False)['Amount'].sum()
# Generating colors for the pie chart
colors = sns.color_palette('Blues', n_colors=len(sales_gender))
# Setting up the figure size and background color
plt.figure(figsize=(9, 6), facecolor='#F7A200')
plt.title('Total Sales By Gender')
# Creating the pie chart
patches, texts, autotexts = plt.pie(sales_gender['Amount'], labels=sales_gender['Gender'], autopct='', colors=colors, pctdistance=0.7)
plt.axis('equal')
# Adding text labels for sales amounts
female_sales = sales_gender[sales_gender['Gender'] == 'F']['Amount'].values[0]
male_sales = sales_gender[sales_gender['Gender'] == 'M']['Amount'].values[0]
plt.text(-0.4, 0, f"{female_sales}", fontsize=12, ha='center', va='center', color='black')
plt.text(0.4, 0, f"{male_sales}", fontsize=12, ha='center', va='top', color='black')
# Adding legend
plt.legend(title='Gender', loc='upper left', bbox_to_anchor=(1, 0.5))
<matplotlib.legend.Legend at 0x29d9a02ad10>
In this section, we explore how the counts vary across different age groups. We visualize this variation using a bar chart to understand the distribution of counts across age groups, with a further breakdown by gender.
# Setting up the figure size and background color
plt.figure(figsize=(9, 6), facecolor='#F7A200')
plt.title('Age group wise count')
# Creating a count plot for age groups with gender breakdown using seaborn
a = sns.countplot(data=flip_kart_df, x='Age Group', hue='Gender', palette='Blues')
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Turning off the grid lines
a.grid(False)
# Adding labels to the bars
for bars in a.containers:
a.bar_label(bars)
In this section, we investigate whether there is a correlation between age groups and the total amount spent. We visualize this relationship using a bar chart to understand how the total amount spent varies across different age groups.
# Setting up the figure size and background color
plt.figure(figsize=(9, 6), facecolor='#F7A200')
plt.title('Age Group Wise Total Amount')
# Calculating total amount spent by each age group
sales_age_group = flip_kart_df.groupby(['Age Group'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
# Creating a bar plot for total amount spent by age group using seaborn
a = sns.barplot(x='Age Group', y='Amount', data=sales_age_group, palette='Blues', errorbar=None)
# Setting the background color of the plot
a.patch.set_facecolor('#f7a200')
# Turning off the grid lines
a.grid(False)
The visualization highlights the significant contribution of the age group 26-35 years to the total amount spent, suggesting a strong correlation between this demographic and purchasing behavior. This underscores the importance of targeting marketing efforts towards this age group and tailoring product offerings to meet their preferences and needs. Understanding the specific preferences and spending habits of customers within the 26-35 age range can provide valuable insights for optimizing marketing strategies, product development, and customer engagement initiatives, ultimately leading to enhanced sales and customer satisfaction.
In this section, we analyze the distribution of orders across the top 10 states. We visualize this distribution using a line plot to understand the variation in total orders across different states.
# Setting up the figure size and background color
plt.figure(figsize=(13, 6), facecolor='#F7A200')
plt.title('State-wise Distribution of Orders')
# Calculating total orders for the top 10 states
orders_state = flip_kart_df.groupby(['State'], as_index=False)['Orders'].sum().sort_values(by='Orders', ascending=False).head(10)
# Creating a line plot for state-wise distribution of orders using seaborn
a = sns.lineplot(data=orders_state, y='Orders', x='State', marker='o', color='blue')
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Turning on the grid lines and setting the color
a.grid(True, color='gray')
# Adding labels to the data points
for index, row in orders_state.iterrows():
a.text(row['State'], row['Orders'], f'{row["Orders"]}', color='black', ha='left', va='center')
# Setting labels and formatting
plt.xlabel('State')
plt.ylabel('Total Orders')
plt.xticks(rotation=90)
plt.tight_layout()
In this section, we explore the distribution of total sales across the top 10 states. We visualize this distribution using a bar plot to understand the contribution of each state to the overall sales.
import matplotlib.pyplot as plt
import seaborn as sns
# Setting up the figure size and background color
plt.figure(figsize=(10, 6), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Total Sales Distribution by States')
# Calculating total sales for the top 10 states
sales_state = flip_kart_df.groupby(['State'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10)
# Creating a bar plot for total sales distribution by states using seaborn
a = sns.barplot(data=sales_state, x='State', y='Amount', palette='Blues')
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Turning off the grid lines
a.grid(False)
# Rotating x-axis labels for better readability
plt.xticks(rotation=90)
# Adding labels to the bars
for index, row in sales_state.iterrows():
a.text(index, row['Amount'], str(round(row['Amount'], 2)), color='black', ha="center")
plt.show()
The analysis highlights Uttar Pradesh, Maharashtra, and Karnataka as the top contributing states in terms of both orders and total sales, indicating their significant importance to the business. Understanding the dominance of these states in driving sales can inform strategic decisions related to resource allocation, marketing campaigns, and expansion efforts, enabling businesses to prioritize these regions for growth opportunities. By focusing on strengthening market presence and customer relationships in Uttar Pradesh, Maharashtra, and Karnataka, businesses can effectively capitalize on the high demand and sales potential within these key markets, ultimately driving sustainable business growth.
In this section, we identify the top 5 customers by sales in different states. We visualize this information using grouped bar charts to understand the distribution of sales among the top customers across various states
# Calculating sales by customer in each state
sales_by_customer = flip_kart_df.groupby(['State', 'Customer_name'])['Amount'].sum().reset_index()
# Sorting the data to find the top 5 customers by sales in each state
sales_by_customer_sorted = sales_by_customer.groupby('State').apply(lambda x: x.nlargest(5, 'Amount')).reset_index(drop=True)
# Extracting unique states
states = sales_by_customer_sorted['State'].unique()
num_states = len(states)
# Determining the number of rows for subplots
num_rows = (num_states + 2) // 3
# Creating subplots
fig, axes = plt.subplots(num_rows, 3, figsize=(12, 5*num_rows), sharex=True, facecolor='#F7A200')
fig.patch.set_facecolor('#F7A200')
# Defining colors for the bar plots
colors = sns.color_palette('Blues', n_colors=5)
# Iterating through states and plotting
for i, state in enumerate(states):
row = i // 3
col = i % 3
data = sales_by_customer_sorted[sales_by_customer_sorted['State'] == state]
ax = sns.barplot(x='Amount', y='Customer_name', hue='Customer_name', data=data, ax=axes[row, col], palette=colors, dodge=False)
ax.set_title(f'Top 5 Customers by Sales in {state}')
ax.set_xlabel('Total Sales')
ax.set_ylabel('Customer Name')
ax.set_facecolor('#F7A200')
if ax.legend_:
ax.legend_.remove()
# Removing unused subplots
for i in range(num_states, num_rows*3):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
Analyzing the top 5 customers by sales across states offers valuable insights into regional buying patterns and customer loyalty, highlighting the key players driving sales within each area. By identifying the top customers in different states, businesses can tailor marketing strategies and customer engagement efforts to nurture relationships with these high-value customers, potentially increasing retention and fostering brand advocacy. Additionally, understanding the preferences and behaviors of top customers in various regions can inform product assortment decisions and personalized marketing initiatives, enabling businesses to meet customer needs more effectively and drive sustainable growth.
In this section, we explore the distribution of marital status among customers. We visualize this distribution using a pie chart to understand the proportion of married and unmarried customers within the dataset.
# Setting up the figure size and background color
plt.figure(figsize=(8, 8), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Marital Status Distribution Among Customers')
# Counting the occurrences of each marital status
marital_counts = flip_kart_df['Marital_Status'].value_counts()
# Renaming index labels for better readability
marital_counts.index = ['Unmarried', 'Married']
# Generating colors for the pie chart
colors = plt.cm.Blues(np.linspace(0.2, 1, len(marital_counts)))
# Creating a pie chart for marital status distribution
pie = plt.pie(marital_counts, labels=[f'{label} ({count})' for label, count in marital_counts.items()], colors=colors, startangle=90)
# Adding legend and setting its title and labels
plt.legend(title='Marital Status', labels=['Unmarried', 'Married'], loc='upper right')
# Setting the limits of the plot
plt.axis((-1.1, 1.1, -1.1, 1.1))
(-1.1, 1.1, -1.1, 1.1)
In this section, we analyze the distribution of sales based on marital status. We visualize this distribution using a bar plot to understand how sales vary between married and unmarried customers.
# Setting up the figure size and background color
plt.figure(figsize=(10,6), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Total Sales Distribution by Marital Status')
# Calculating total sales by marital status and gender
sales_marital = flip_kart_df.groupby(['Marital_Status', 'Gender'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
# Creating a bar plot for sales distribution by marital status using seaborn
a = sns.barplot(data=sales_marital, x='Marital_Status', y='Amount', hue='Gender', palette=('Blues'))
# Turning off the grid lines
a.grid(False)
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
In this section, we explore the distribution of customers across different occupations. We visualize this distribution using a horizontal bar plot to understand the representation of various occupations within the customer base.
import matplotlib.pyplot as plt
import seaborn as sns
# Setting up the figure size and background color
plt.figure(figsize=(10, 6), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Distribution of Customers Across Occupations')
# Creating a count plot for the distribution of customers across occupations using seaborn
a = sns.countplot(data=flip_kart_df, y='Occupation', hue='Occupation', palette='Blues', order=flip_kart_df['Occupation'].value_counts().index)
# Turning off the grid lines
a.grid(False)
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Adding labels to the bars
for bars in a.containers:
a.bar_label(bars)
# Adding legend
plt.legend(title='Occupation', loc='upper right')
plt.show()
The IT sector emerges as the most prominent occupation among customers, with 1583 individuals, indicating a strong market presence and potential business opportunities within this industry. Following closely behind, the healthcare sector also demonstrates a significant consumer presence, with a notable count of individuals, highlighting the potential for targeted marketing efforts and tailored product offerings to capitalize on this market segment. Other notable occupations such as engineering and education also contribute to the customer base, suggesting diverse demographic representation and opportunities for market segmentation and personalized engagement strategies. Conversely, the agriculture sector exhibits the lowest customer representation, with only 283 individuals.
In this section, we examine the distribution of sales across different occupations. We visualize this distribution using a bar plot to understand how sales vary among various occupation categories.
# Setting up the figure size and background color
plt.figure(figsize=(10,6), facecolor='#f7a200')
# Setting the title of the plot
plt.title('Distribution of Sales Across Occupations')
# Calculating total sales by occupation
sales_occupation = flip_kart_df.groupby(['Occupation'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False)
# Creating a bar plot for sales distribution by occupation using seaborn
a = sns.barplot(data=sales_occupation, x='Occupation', y='Amount', palette='Blues', hue='Occupation', dodge=False)
# Turning off the grid lines
a.grid(False)
# Setting the background color of the plot
a.patch.set_facecolor('#f7a200')
# Rotating x-axis labels for better readability
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]), [Text(0, 0, 'IT Sector'), Text(1, 0, 'Healthcare'), Text(2, 0, 'Aviation'), Text(3, 0, 'Banking'), Text(4, 0, 'Govt'), Text(5, 0, 'Hospitality'), Text(6, 0, 'Media'), Text(7, 0, 'Automobile'), Text(8, 0, 'Chemical'), Text(9, 0, 'Lawyer'), Text(10, 0, 'Retail'), Text(11, 0, 'Food Processing'), Text(12, 0, 'Construction'), Text(13, 0, 'Textile'), Text(14, 0, 'Agriculture')])
The charts highlight a significant proportion of buyers employed in the IT, healthcare, and aviation sectors, indicating substantial sales generated from these industries. To capitalize on this trend, businesses should develop targeted marketing campaigns and tailored product offerings to cater to the specific needs and preferences of customers within these sectors. Conversely, the agriculture sector exhibits the lowest total sales, suggesting limited purchasing activity within this industry compared to others. Businesses operating in this sector may benefit from exploring opportunities for diversification or innovation to attract and engage customers more effectively. This could involve partnerships, product differentiation, or the introduction of value-added services. Continuous monitoring and analysis of sales distribution across different occupations can provide valuable insights for strategic decision-making. By adapting approaches and allocating resources more efficiently based on these insights, businesses can maximize sales and drive sustainable growth.
In this section, we explore the distribution of product count by category. We visualize this distribution using a horizontal bar plot to understand the representation of different product categories within the dataset
import matplotlib.pyplot as plt
import seaborn as sns
# Setting up the figure size and background color
plt.figure(figsize=(10, 6), facecolor='#F7A200')
# Setting the title of the plot
plt.title('Distribution of Products by Category')
# Creating a count plot for the distribution of products by category using seaborn
a = sns.countplot(data=flip_kart_df, y='Product_Category', hue='Product_Category', palette='Blues', order=flip_kart_df['Product_Category'].value_counts().index)
# Turning off the grid lines
a.grid(False)
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Adding labels to the bars
for bars in a.containers:
a.bar_label(bars)
# Adding legend
plt.legend(title='Product Category', loc='upper right')
plt.show()
Clothing and apparel emerge as the most frequently purchased category, with 2655 units sold, followed by food and electronic gadgets, indicating consumer preferences and potentially lucrative market segments within the dataset. To capitalize on the popularity of these categories, businesses can enhance their product offerings, optimize marketing strategies, and prioritize inventory management to meet customer demand effectively. Conversely, hand and power tools exhibit the lowest sales volume, with only 26 units sold, suggesting limited consumer interest or market demand for this category. Businesses operating in this sector may consider evaluating their product range, exploring opportunities for product diversification or enhancements, and implementing targeted promotional campaigns to stimulate sales and attract customers. Continuous monitoring and analysis of product count distribution by category can provide valuable insights for strategic decision-making and product development. By understanding consumer preferences and market trends, businesses can adapt their strategies, innovate their offerings, and stay competitive in the dynamic retail landscape, ultimately driving business growth and profitability.
In this analysis, we examine the distribution of sales across different product categories. By visualizing the total sales generated by the top 10 product categories, we gain insights into the sales performance of various product categories within the dataset. These insights can inform strategic decision-making and marketing strategies, helping businesses identify high-performing product categories and areas for potential growth or optimization.
import matplotlib.pyplot as plt
import seaborn as sns
# Calculating the top 10 product categories by total sales
sales_product = flip_kart_df.groupby(['Product_Category'], as_index=False)['Amount'].sum().sort_values(by='Amount', ascending=False).head(10)
# Calculating the number of unique product categories
num_categories = len(sales_product['Product_Category'].unique())
# Generating a color palette for the plot
palette = sns.color_palette("Blues", n_colors=num_categories)
# Setting up the figure size and background color
plt.figure(figsize=(10,6), facecolor='#F7A200')
plt.title('Distribution of Sales by Product')
# Creating a bar plot for the distribution of sales by product using seaborn
a = sns.barplot(data=sales_product, x='Product_Category', y='Amount', hue='Product_Category', palette='Blues')
# Turning off the grid lines
a.grid(False)
# Setting the background color of the plot
a.patch.set_facecolor('#F7A200')
# Rotating x-axis labels for better readability
plt.xticks(rotation=90)
# Adding legend
plt.legend(title='Product Category', loc='upper right')
plt.show()
The charts reveal that the majority of sold products belong to the Food, Clothing, and Electronics categories, with Food emerging as the highest-selling category and Stationery as the lowest-selling category. To leverage this trend, businesses can focus on optimizing their offerings within these top-performing categories, investing in product quality, variety, and promotions to meet consumer demand effectively.
Understanding the popularity and sales performance of different product categories can guide businesses in making informed decisions about inventory management, pricing strategies, and marketing campaigns. By allocating resources and efforts strategically, businesses can maximize sales opportunities within high-performing categories while also exploring avenues for improving sales in underperforming categories.
Continuous monitoring and analysis of sales distribution across product categories are essential for identifying market trends, consumer preferences, and emerging opportunities. By staying attentive to shifts in consumer behavior and market dynamics, businesses can adapt their strategies proactively, innovate their offerings, and maintain a competitive edge in the retail landscape, ultimately driving business growth and profitability.
In this analysis, we identify the top 5 product categories by sales in different states. By visualizing the total sales of each product category within the dataset, segmented by state, we gain insights into regional sales trends and consumer preferences. This analysis helps businesses understand which product categories perform best in each state, allowing for targeted marketing efforts and inventory management strategies.
import matplotlib.pyplot as plt
import seaborn as sns
# Calculating total sales by product category in each state
sales_by_product = flip_kart_df.groupby(['State', 'Product_Category'])['Amount'].sum().reset_index()
# Sorting the data to find the top 5 product categories by sales in each state
sales_by_product_sorted = sales_by_product.groupby('State').apply(lambda x: x.nlargest(5, 'Amount')).reset_index(drop=True)
# Extracting unique states and calculating the number of rows needed for subplots
states = sales_by_product_sorted['State'].unique()
num_states = len(states)
num_rows = (num_states + 2) // 3
# Creating subplots for visualization
fig, axes = plt.subplots(num_rows, 3, figsize=(12, 5*num_rows), sharex=True, facecolor='yellow')
fig.patch.set_facecolor('#F7A200')
# Iterating over states to plot top 5 product categories by sales
for i, state in enumerate(states):
row = i // 3
col = i % 3
data = sales_by_product_sorted[sales_by_product_sorted['State'] == state]
ax = sns.barplot(x='Amount', y='Product_Category', data=data, ax=axes[row, col], palette='Blues', hue='Product_Category')
ax.set_title(f'Top 5 categories by Sales in {state}')
ax.set_xlabel('Total Sales')
ax.set_ylabel('Product_Category')
ax.set_facecolor('#F7A200')
# Removing legend for each subplot
ax.get_legend().remove()
# Removing excess subplots if necessary
for i in range(num_states, num_rows*3):
fig.delaxes(axes.flatten()[i])
plt.tight_layout()
plt.legend(title='Product Category', loc='upper right')
plt.show()
Examining the top 5 product categories by sales across states provides valuable insights into consumer preferences and regional market dynamics. It showcases which product categories are most popular and profitable in different regions, reflecting both cultural influences and economic factors.
Based on these insights, businesses can tailor their marketing strategies and product offerings to align with the preferences of consumers in each state. By understanding regional variations in demand, businesses can optimize their inventory management, target advertising campaigns effectively, and capitalize on emerging market trends to drive sales growth and enhance customer satisfaction.
Continuous monitoring of sales performance across different product categories and states enables businesses to adapt quickly to changing market conditions and consumer behavior. By leveraging data-driven insights, businesses can make informed decisions, identify opportunities for expansion, and stay competitive in the dynamic retail landscape, ultimately fostering long-term success and profitability.
In this analysis, we explore how the gender distribution varies across different states. By visualizing the number of customers by gender in each state, we gain insights into regional gender demographics and potential variations in consumer behavior. This analysis helps businesses understand the gender composition of their customer base in different regions, enabling targeted marketing efforts and tailored product offerings.
# Calculating gender distribution across different states
gender_distribution = flip_kart_df.groupby(['State', 'Gender']).size().reset_index(name='Count')
# Setting up the figure size and background color
plt.figure(figsize=(12, 8), facecolor='#f7a200')
plt.title('Gender Distribution Across Different States')
# Creating a bar plot for gender distribution across states using seaborn
ax = sns.barplot(data=gender_distribution, x='State', y='Count', hue='Gender', palette='Blues')
# Setting labels and formatting
plt.xlabel('State')
plt.ylabel('Number of Customers')
plt.grid(False)
plt.gca().patch.set_facecolor('#f7a200')
plt.legend(title='Gender', loc='upper right')
plt.xticks(rotation=90)
# Adding labels to the bars
for p in ax.patches:
ax.annotate(f'{p.get_height():.0f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center',
va='baseline',
xytext=(0, 5),
textcoords='offset points')
In this analysis, we investigate whether there are any noticeable patterns in the gender distribution based on product categories. By visualizing the number of customers by gender within each product category, we aim to identify any significant differences or trends in consumer behavior across different product segments.
# Calculating gender distribution by product category
gender_by_category = flip_kart_df.groupby(['Product_Category', 'Gender']).size().reset_index(name='Count')
# Setting up the figure size and background color
plt.figure(figsize=(12, 8), facecolor='#f7a200')
plt.title('Gender Distribution by Product Category')
# Creating a color palette for the genders
palette = sns.color_palette("Blues", n_colors=len(gender_by_category['Gender'].unique()))
# Creating a bar plot for gender distribution by product category using seaborn
ax = sns.barplot(data=gender_by_category, x='Product_Category', y='Count', hue='Gender', palette=palette)
# Setting labels and formatting
plt.xlabel('Product Category')
plt.ylabel('Number of Customers')
plt.grid(False)
plt.gca().patch.set_facecolor('#f7a200')
plt.legend(title='Gender', loc='upper right')
plt.xticks(rotation=90)
# Adding labels to the bars
for p in ax.patches:
ax.annotate(f'{p.get_height():.0f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center',
va='baseline',
xytext=(0, 5),
textcoords='offset points')
The gender distribution across various product categories reveals a consistent trend of female dominance, with the majority of categories showing a higher proportion of female customers. This pattern is evident in product categories such as clothing, cosmetics, household items, and electronics. However, a notable exception is observed in the category of hand and power tools and books, where male customers are exclusively represented. This suggests a clear preference among male consumers for such products.
Based on these insights, businesses can tailor their product offerings and marketing strategies to cater to the preferences of different gender demographics. By understanding the gender-specific preferences within each product category, businesses can optimize their product assortments, enhance customer engagement, and drive sales growth effectively.
In this extensive analysis of Flipkart sales data, we conducted a thorough examination of various dimensions of consumer behavior and market trends. Through meticulous data preparation and in-depth exploration, we uncovered valuable insights that can inform strategic decision-making and drive business growth in the e-commerce sector.
Gender Distribution: Our analysis revealed a prevalent trend of female dominance across multiple product categories, indicating significant opportunities for targeted marketing and product development strategies tailored to female consumers.
Age Demographics: By analyzing sales performance across different age groups, we identified specific age cohorts, such as the 26-35 age group, contributing significantly to total sales, highlighting the importance of understanding demographic preferences in marketing efforts.
Product Preferences: Detailed examination of product category sales patterns unveiled distinct preferences among male and female consumers, with certain categories, like clothing and electronics, attracting a higher proportion of female customers, while others, such as hand and power tools, appealed predominantly to male consumers.
Regional Variations: Our analysis also uncovered variations in sales performance across different states, with Uttar Pradesh emerging as a key market with the highest number of orders. Understanding regional dynamics is crucial for tailoring marketing strategies to local preferences and maximizing sales potential.
Targeted Marketing: The insights gleaned from our analysis provide actionable intelligence for crafting targeted marketing campaigns that resonate with specific demographic segments and regional preferences, thereby enhancing customer engagement and driving sales growth.
Product Assortment Optimization: Businesses can leverage the identified product preferences to optimize their product assortments, ensuring alignment with consumer demand and enhancing overall customer satisfaction and retention.
Competitive Advantage: By harnessing data-driven insights, Flipkart and other e-commerce businesses can gain a competitive edge in the dynamic marketplace, fostering innovation, and agility in response to evolving consumer trends and preferences.
In conclusion, our analysis serves as a valuable resource for stakeholders in the e-commerce industry, offering actionable insights to guide strategic decision-making and capitalize on emerging market opportunities. By leveraging data analytics, businesses can navigate the complexities of the online retail landscape with confidence and drive sustainable growth and profitability.